## Parsed with column specification:
## cols(
## .default = col_double(),
## player_name = col_character(),
## player_extended_name = col_character(),
## quality = col_character(),
## revision = col_character(),
## origin = col_character(),
## club = col_character(),
## league = col_character(),
## nationality = col_character(),
## position = col_character(),
## date_of_birth = col_date(format = ""),
## added_date = col_date(format = ""),
## pref_foot = col_character(),
## att_workrate = col_character(),
## def_workrate = col_character(),
## traits = col_character(),
## specialities = col_character(),
## pc_last = col_logical(),
## pc_min = col_logical(),
## pc_max = col_logical()
## )
## See spec(...) for full column specifications.
## Parsed with column specification:
## cols(
## .default = col_double(),
## player_name = col_character(),
## player_extended_name = col_character(),
## quality = col_character(),
## revision = col_character(),
## origin = col_character(),
## club = col_character(),
## league = col_character(),
## nationality = col_character(),
## position = col_character(),
## date_of_birth = col_date(format = ""),
## added_date = col_date(format = ""),
## pref_foot = col_character(),
## att_workrate = col_character(),
## def_workrate = col_character(),
## traits = col_character(),
## specialities = col_character()
## )
## See spec(...) for full column specifications.
## Parsed with column specification:
## cols(
## .default = col_double(),
## player_name = col_character(),
## player_extended_name = col_character(),
## quality = col_character(),
## revision = col_character(),
## origin = col_character(),
## club = col_character(),
## league = col_character(),
## nationality = col_character(),
## position = col_character(),
## date_of_birth = col_date(format = ""),
## added_date = col_date(format = ""),
## pref_foot = col_character(),
## att_workrate = col_character(),
## def_workrate = col_character(),
## traits = col_character(),
## specialities = col_character()
## )
## See spec(...) for full column specifications.
## Parsed with column specification:
## cols(
## .default = col_double(),
## player_name = col_character(),
## player_extended_name = col_character(),
## quality = col_character(),
## revision = col_character(),
## origin = col_character(),
## club = col_character(),
## league = col_character(),
## nationality = col_character(),
## position = col_character(),
## date_of_birth = col_character(),
## added_date = col_character(),
## pref_foot = col_character(),
## att_workrate = col_character(),
## def_workrate = col_character(),
## traits = col_character(),
## specialities = col_logical()
## )
## See spec(...) for full column specifications.
The video game FIFA, which is developed by Electronic Arts (EA) Sports, has become the most popular sports video game in the world in recent years, largely due to its game mode Ultimate Team. The objective of Ultimate Team is to build the best team possible through both buying and selling players, as well as buying packs of cards similarly to how people buy soccer trading cards in real life. Each player receives ratings in various categories based on their real life abilities, and each of these ratings factor into their overall rating. At the end of each season, EA Sports creates a Team of the Season (TOTS), where they select the best player at each position in each league from that season based on how they performed in real life. The players who receive TOTS cards also receive a boost to their overall rating to reflect their abilities in real life. Although most of their choices for TOTS are understandable, there are some choices that confuse and sometimes anger fans. Along with this, EA has never explained how they make their choices. Through the use of machine learning methods and predictive modeling, we aim to determine which variables are most important when choosing a player for TOTS, as well as predict the Team of the Season for Europe’s top five leagues based on this season’s statistics.
Materials: We retrieved complete player datasets for FIFA 17, FIFA 18, and FIFA 19 from here. We retrieved real life statistics from the 2016-2017, 2017-2018, and 2018-2019 seasons from fbref.com. We did not use data from the 2019-2020 season because COVID-19 caused each season to prematurely end in March of 2020.
Methods:
Using these data sets we went about predicting team of the season players using a Random Forest machine learning model. OTher models were tested, but we found that this method was the best. This makes many decision trees using the data to predict what players will be in the team of the season based upon the information that we feed into it. It then puts all of those trees together in order to make a decision on whether or not a player should be in the team of the season. We can then apply that model to data that it did not use in deciding how to decide whether or not a player is in the team of the season in order to check how good our model really is.
Revision: Whether the card is “Normal” or “Team of the Season (TOTS)”
Int : Interceptions
TklW : Tackles Won
OG : Own Goals
Pkcon : Penalties Conceded
MP: Matches Played
Min : Minutes
Gls : Goals
Ast: Assists
Non_Pk_G : Non Penalty Goals (Goals from Open Play or Free Kicks)
Pk: Penalty Kicks
Pkatt: Penalty Attempts
CrdY : Yellow Cards
CrdR : Red Cards
G_per90 : Goals per 90 minutes
A_per90 : Assists per 90 minutes
G_plus_A_per90 : Goals plus Assists per 90 minutes
G_minus_pk_per90 : Non Penalty Goals per 90 minutes
Rk : Table Position
GF : Goals For (Goals your team has scored)
GA : Goals Against (Goals your team has conceded)
GD : Goal Difference (GF-GA)
Pts : Team Points for the Season (3 for a win, 1 for a draw, 0 for a loss)
| League | Goals | Assists | Non PK Goals | PK | Team Rank | Minutes Per 90 | Goals SD | Assists SD | Non PK Goals SD | Team Rank SD | Minutes Per 90 SD |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Premier League | 2.09 | 1.47 | 1.94 | 0.15 | 10.58 | 17.05 | 3.75 | 2.27 | 3.42 | 5.74 | 11.47 |
| La Liga | 2.05 | 1.41 | 1.85 | 0.20 | 10.60 | 16.80 | 3.89 | 2.04 | 3.46 | 5.82 | 10.62 |
| Ligue 1 | 1.95 | 1.30 | 1.75 | 0.21 | 10.49 | 16.94 | 3.67 | 1.95 | 3.17 | 5.77 | 11.09 |
| Bundesliga | 1.97 | 1.39 | 1.82 | 0.16 | 9.64 | 15.07 | 3.42 | 2.09 | 3.07 | 5.13 | 10.02 |
| Serie A | 2.05 | 1.35 | 1.85 | 0.19 | 10.61 | 16.51 | 3.73 | 2.06 | 3.32 | 5.77 | 11.07 |
The Premier League is widely considered the best league in the world. A league full of tradition and history that has seen many dominant teams and outstanding players. In recent history the league has been generally dominated by Manchester City and Liverpool, both of which won league titles by large margins. With the influx of foreign money in the league the talent gap between the top and the bottom of the league has seen steady growth, but those at the bottom continue to make it competitive.
Before diving into modeling, we first must explore the data to observe basic trends. First, we looked at the proportion of Premier League cards that are given the TOTS designation. Below, we see that a select few cards are given the TOTS designation.
We also wanted to look at goals scored by TOTS players versus normal players. In this density plot, we are able to see that TOTS players score significantly more goals than regular players.
We also found that final table position and player card status were highly correlated, specifically that players with TOTS cards generally played for teams that finished highly in the table. In the past three years, each team of the season has generally been filled with many of the top teams’ players, and the density plot below reflects this.
Players who receive TOTS cards are usually the most important players to their teams, and because of this, play more minutes per contest. The density plot below is evidence of this fact.
Finally, TOTS distribution is expected to be vary from league to league, so it is important to look at the distribution specific to the Premier League. In the Premier League, the position with the highest number of TOTS cards is striker.
Before modeling the data, we must split the data into training and testing sets. The training data is the data that we give to the model to learn from, while the testing data is what we use to test our model. It is important that the Key Performance Indicators (KPIs) are similar in each dataset, as this indicates that the model that has learned from the training data is correctly being applied to the testing data.
| Revision | Type | Goals | Assists | Non PK Goals | PK | Team Rank | Minutes Per 90 | Goals SD | Assists SD | Non PK Goals SD | Team Rank SD | Minutes Per 90 SD |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Normal | Training | 3.055851 | 2.327128 | 2.781915 | 0.2739362 | 11.069149 | 26.94010 | 3.649960 | 2.360092 | 3.264401 | 5.455811 | 5.377800 |
| Normal | Testing | 3.200000 | 2.128000 | 2.872000 | 0.3280000 | 11.816000 | 27.45058 | 3.632292 | 2.094447 | 3.113384 | 5.492493 | 5.310808 |
| TOTS | Training | 8.942308 | 5.250000 | 8.269231 | 0.6730769 | 3.557692 | 31.76645 | 8.756876 | 4.167451 | 7.819321 | 3.268468 | 3.829581 |
| TOTS | Testing | 10.470588 | 7.352941 | 10.117647 | 0.3529412 | 4.058823 | 29.53987 | 8.768678 | 4.372373 | 8.388104 | 3.230143 | 4.999096 |
After seeing that our training and testing sets performed similarly, we created a random forest model to predict whether a player would be classified as TOTS or not. Our random forest model was made up of 100 decision trees. Each of these trees are uncorrelated, which helps provide stability and accuracy to the model. We also created a LASSO model, which filters out explanatory variables based on their importance to the outcome, for the training and testing data, however we found that the random forest was more accurate.
## Preparation of a new explainer is initiated
## -> model label : rf
## -> data : 428 rows 31 cols
## -> target variable : 428 values
## -> predict function : yhat.workflow will be used ( [33m default [39m )
## -> predicted values : No value for predict function target column. ( [33m default [39m )
## -> model_info : package tidymodels , ver. 0.1.2 , task classification ( [33m default [39m )
## -> predicted values : numerical, min = 0 , mean = 0.1600099 , max = 1
## -> residual function : difference between y and yhat ( [33m default [39m )
## -> residuals : numerical, min = -0.9972222 , mean = -0.03851457 , max = 0.8262757
## [32m A new explainer has been created! [39m
Using our random forest model, we were able to observe which variables were most important to our model. It appears that goals against each player’s team, minutes played, and matches played.
VIP
The confusion matrix below shows that 17 players were classified as TOTS. 10 of these players were correctly classified, while the model felt that 7 players who were not given TOTS cards should have been given one. It also felt that 7 players who were given TOTS cards should not have been given one.
## Truth
## Prediction Normal TOTS
## Normal 116 8
## TOTS 9 9
Below are the players that our testing model incorrectly classified. Many of these players were either undervalued or overvalued based on the performance of their team. It is clear that the choices for TOTS are someone subjective.
## Player revision position Int TklW OG PKcon Nation
## 1 Kevin De Bruyne 17 TOTS CM 24 35 0 0 BEL
## 2 Eric Dier 17 Normal CDM 37 34 0 0 ENG
## 3 Adam Lallana 17 TOTS CM 20 35 0 0 ENG
## 4 Sadio Mane 17 TOTS RW 11 18 0 1 SEN
## 5 Victor Moses 17 Normal RB 41 42 0 0 NGA
## 6 Paul Pogba 17 Normal CM 37 40 0 1 FRA
## 7 Victor Wanyama 17 Normal CDM 39 64 0 0 KEN
## 8 Philippe Coutinho 17 Normal LW 18 25 0 0 BRA
## 9 Sergio Aguero 18 TOTS ST 8 5 0 0 ARG
## 10 Eric Dier 18 Normal CB 30 35 0 0 ENG
## 11 Abdoulaye Doucoure 18 TOTS CDM 41 41 0 1 FRA
## 12 Andrew Robertson 18 TOTS LB 24 21 0 0 SCO
## 13 Antonio Valencia 18 Normal RB 43 37 0 0 ECU
## 14 Christian Eriksen 19 TOTS CAM 11 27 0 0 DEN
## 15 Harry Kane 19 Normal ST 4 7 0 0 ENG
## 16 James Maddison 19 TOTS CAM 12 34 0 0 ENG
## 17 Callum Wilson 19 Normal ST 1 9 0 0 ENG
## Squad Age Born MP Min minutes_played_divided_by90 Gls Ast
## 1 Manchester City 25 1991 36 2877 32.0 6 17
## 2 Tottenham 22 1994 36 3043 33.8 2 1
## 3 Liverpool 28 1988 31 2348 26.1 8 6
## 4 Liverpool 24 1992 27 2235 24.8 13 5
## 5 Chelsea 25 1990 34 2483 27.6 3 2
## 6 Manchester Utd 23 1993 30 2608 29.0 5 4
## 7 Tottenham 25 1991 36 3012 33.5 4 1
## 8 Liverpool 24 1992 31 2227 24.7 13 8
## 9 Manchester City 29 1988 25 1963 21.8 21 6
## 10 Tottenham 23 1994 34 2824 31.4 0 2
## 11 Watford 24 1993 37 3324 36.9 7 3
## 12 Liverpool 23 1994 22 1940 21.6 1 5
## 13 Manchester Utd 31 1985 31 2740 30.4 3 1
## 14 Tottenham 26 1992 35 2774 30.8 8 12
## 15 Tottenham 25 1993 28 2424 26.9 17 4
## 16 Leicester City 21 1996 36 2831 31.5 7 7
## 17 Bournemouth 26 1992 30 2528 28.1 14 9
## Non_PK_G PK PKatt CrdY CrdR G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90
## 1 6 0 1 4 0 0.19 0.53 0.72 0.19
## 2 2 0 0 6 0 0.06 0.03 0.09 0.06
## 3 8 0 0 3 0 0.31 0.23 0.54 0.31
## 4 13 0 0 4 0 0.52 0.20 0.72 0.52
## 5 3 0 0 4 0 0.11 0.07 0.18 0.11
## 6 5 0 0 7 0 0.17 0.14 0.31 0.17
## 7 4 0 0 10 0 0.12 0.03 0.15 0.12
## 8 13 0 0 2 0 0.53 0.32 0.85 0.53
## 9 17 4 4 2 0 0.96 0.28 1.24 0.78
## 10 0 0 0 4 0 0.00 0.06 0.06 0.00
## 11 7 0 0 10 0 0.19 0.08 0.27 0.19
## 12 1 0 0 2 0 0.05 0.23 0.28 0.05
## 13 3 0 0 7 0 0.10 0.03 0.13 0.10
## 14 8 0 0 3 0 0.26 0.39 0.65 0.26
## 15 13 4 4 5 0 0.63 0.15 0.78 0.48
## 16 6 1 2 4 1 0.22 0.22 0.45 0.19
## 17 13 1 2 3 0 0.50 0.32 0.82 0.46
## G_plus_A_minus_PK_per90 Rk GF GA GD Pts Attendance .pred_Normal
## 1 0.72 3 80 39 41 78 54019 0.574806172
## 2 0.09 2 86 26 60 86 31639 0.004117647
## 3 0.54 4 78 42 36 76 53016 0.999473684
## 4 0.72 4 78 42 36 76 53016 0.598413893
## 5 0.18 1 85 33 52 93 41508 0.002777778
## 6 0.31 6 54 29 25 69 75290 0.076666667
## 7 0.15 2 86 26 60 86 31639 0.013367647
## 8 0.85 4 78 42 36 76 53016 0.123344971
## 9 1.05 1 106 27 79 100 54070 0.828775712
## 10 0.06 3 74 36 38 77 67953 0.010000000
## 11 0.27 14 44 64 -20 41 20231 0.972544576
## 12 0.28 4 84 38 46 75 53049 0.926223547
## 13 0.13 2 68 28 40 81 74976 0.021478758
## 14 0.65 4 67 39 28 71 54216 0.849877722
## 15 0.63 4 67 39 28 71 54216 0.348065973
## 16 0.41 9 51 48 3 52 31851 0.984237476
## 17 0.78 14 56 70 -14 45 10532 0.123344971
## .pred_TOTS .pred_class
## 1 0.4251938282 Normal
## 2 0.9958823529 TOTS
## 3 0.0005263158 Normal
## 4 0.4015861068 Normal
## 5 0.9972222222 TOTS
## 6 0.9233333333 TOTS
## 7 0.9866323529 TOTS
## 8 0.8766550291 TOTS
## 9 0.1712242882 Normal
## 10 0.9900000000 TOTS
## 11 0.0274554241 Normal
## 12 0.0737764531 Normal
## 13 0.9785212418 TOTS
## 14 0.1501222775 Normal
## 15 0.6519340269 TOTS
## 16 0.0157625241 Normal
## 17 0.8766550291 TOTS
## Warning: Novel levels found in column 'Nation': 'BFA', 'MKD', 'SKN', 'ZIM'. The
## levels have been removed, and values have been coerced to 'NA'.
## Warning: Novel levels found in column 'Nation': 'BFA', 'MKD', 'SKN', 'ZIM'. The
## levels have been removed, and values have been coerced to 'NA'.
Finally, we applied our model to the Premier League stats from the 2020-2021 season. The players who were chosen for TOTS are shown below.
| Player | Position | Squad | Minutes Played | Starts | Min | Goals | Assists | Team Rank | Points | Predicted TOTS Probability | Projected Role |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Harry Kane | ST | Tottenham | 30 | 30 | 2632 | 21 | 13 | 7 | 53 | 0.9356553 | Starter |
| Ollie Watkins | ST | Aston Villa | 32 | 32 | 2880 | 12 | 4 | 11 | 45 | 0.9068442 | Starter |
| Timo Werner | ST | Chelsea | 31 | 25 | 2243 | 6 | 6 | 4 | 58 | 0.9019166 | Starter |
| Bruno Fernandes | CAM | Manchester Utd | 33 | 32 | 2821 | 16 | 11 | 2 | 67 | 0.9895714 | Starter |
| Rodri | CDM | Manchester City | 29 | 27 | 2353 | 2 | 1 | 1 | 77 | 0.9751046 | Starter |
| Marcus Rashford | LM | Manchester Utd | 33 | 31 | 2686 | 10 | 8 | 2 | 67 | 0.9692937 | Starter |
| Aaron Wan Bissaka | RB | Manchester Utd | 31 | 31 | 2790 | 2 | 2 | 2 | 67 | 0.9903896 | Starter |
| Harry Maguire | CB | Manchester Utd | 33 | 33 | 2970 | 2 | 1 | 2 | 67 | 0.9798704 | Starter |
| Matt Targett | LB | Aston Villa | 32 | 32 | 2864 | 0 | 1 | 11 | 45 | 0.9054538 | Starter |
| Ezri Konsa | CB | Aston Villa | 30 | 29 | 2656 | 2 | 0 | 11 | 45 | 0.9037871 | Starter |
| Mohamed Salah | RW | Liverpool | 32 | 29 | 2633 | 20 | 3 | 6 | 54 | 0.7402674 | Bench |
| Jamie Vardy | ST | Leicester City | 29 | 26 | 2401 | 13 | 8 | 3 | 62 | 0.5050886 | Bench |
| Mason Mount | CAM | Chelsea | 32 | 28 | 2545 | 6 | 4 | 4 | 58 | 0.9170196 | Bench |
| John McGinn | CM | Aston Villa | 31 | 31 | 2790 | 2 | 5 | 11 | 45 | 0.9095714 | Bench |
| Edouard Mendy | LB | Chelsea | 27 | 27 | 2430 | 0 | 0 | 4 | 58 | 0.8524394 | Bench |
Premier League Team of the Season
La Liga has been dominated for many years by Barcelona and Real Madrid, two of the most storied clubs in the world. For the past decade it has been the story of Messi vs Ronaldo, best vs best. These two clubs have won the most Champions League trophies in the last decade and it is rare that one of them does not win the league. Outside of those two clubs the league somewhat struggles for talent, especially defensively, but the gap has seen some closing in the last few years.
The first exploratory plot we looked at was the number of TOTS vs Normal cards, and as you can see there are not many team of the season players in the data set.
Then we looked at goals scored by TOTS and normal players, and while both the densities are low, TOTS players tend to score more goals.
Next, we have a density plot of table position for team of the season players and normal player, TOTS player tend to finsish higher in the table.
Next, we have a density plot of minutes played for the team of the season players vs the normal players. Clearly the TOTS players play more.
This last exploratory plot shows us the distribution of the positions. As you can see there are not many center forwards, so those have been converted to strikers.
| Revision | Type | Goals | Assists | Non PK Goals | PK | Team Rank | Minutes Per 90 | Goals SD | Assists SD | Non PK Goals SD | Team Rank SD | Minutes Per 90 SD |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Normal | Training | 2.988764 | 2.207865 | 2.705056 | 0.2837079 | 10.772472 | 26.22697 | 3.773874 | 1.999027 | 3.360093 | 5.336711 | 4.584873 |
| Normal | Testing | 2.838983 | 2.347458 | 2.550848 | 0.2881356 | 10.686441 | 26.31591 | 3.215801 | 2.419215 | 2.784565 | 5.743371 | 4.868298 |
| TOTS | Training | 8.958333 | 5.000000 | 7.520833 | 1.4375000 | 4.083333 | 29.57315 | 8.829251 | 3.695886 | 7.795224 | 3.923922 | 3.642296 |
| TOTS | Testing | 11.933333 | 4.733333 | 10.733333 | 1.2000000 | 6.333333 | 30.12296 | 12.831138 | 3.712270 | 11.516861 | 6.488084 | 4.007976 |
The plot below show the importance of the variables in the La Liga model, the most important stats are “Minutes Played”, “Tackles Won”, and “Penalty Kicks Scored”.
The confusion matrix below shows us that we guessed 9 TOTS rights, 112 normal cards right, and 12 total wrong in the testing data for La Liga.
## Warning: Novel levels found in column 'Nation': 'ISR', 'MLI', 'NOR'. The levels
## have been removed, and values have been coerced to 'NA'.
## Warning: Novel levels found in column 'Nation': 'ISR', 'MLI', 'NOR'. The levels
## have been removed, and values have been coerced to 'NA'.
Here is our predicted team of the season for La Liga 20/21:
La Liga Team of the Season
Generally considered the worst of the top 5 European leagues, Ligue 1 has been completely dominated by PSG for many years. Often called a “farmer’s league” and sometimes not even considered among the best leagues in the world. However, there is no doubt that PSG is one of the best teams in the world. With the likes of Mbappe and Neymar they managed to make it to the Champions League final last season and are in the semi-finals currently.
We began our modeling for Ligue 1 by joining the Ligue 1 datasets from 2017, 2018, and 2019.
We then began with exploratory plots. The first plot showed us how many players were given TOTS cards in the three combined datasets. We are able to see that once again only a small proportion of players are given TOTS cards.
Next, we looked at the density of goals scored between regular players and TOTS players. We were able to see that in general, a larger proportion of TOTS players score a higher number of goals.
Next, we looked at the density of table position by card type. We see that there is an even density of table position for normal cards, while the majority of TOTS players play for better teams.
We then looked at the density of minutes played per match and, unsurprisingly, players who are given TOTS cards tend to play more minutes per contest.
Finally, we looked at the distribution of TOTS cards by position. We are able to see that there is an overwhelming number of strikers and center backs in Ligue 1, and that players who play in the center of the field.
We also evaluated the metrics between the training and testing data to see if there was a significant difference between the two. For Ligue 1, there was not a significant difference in any of the important columns.
## `summarise()` has grouped output by 'Revision'. You can override using the `.groups` argument.
## `summarise()` has grouped output by 'Revision'. You can override using the `.groups` argument.
| Revision | Type | Goals | Assists | Non PK Goals | PK | Team Rank | Minutes Per 90 | Goals SD | Assists SD | Non PK Goals SD | Team Rank SD | Minutes Per 90 SD |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Normal | Training | 2.721449 | 2.069638 | 2.428969 | 0.2924791 | 10.944290 | 26.68521 | 3.426052 | 2.017562 | 2.966262 | 5.416942 | 4.902297 |
| Normal | Testing | 3.075630 | 2.025210 | 2.722689 | 0.3529412 | 11.613445 | 27.59384 | 3.751067 | 2.023013 | 3.244093 | 5.611896 | 5.420954 |
| TOTS | Training | 9.041667 | 4.645833 | 7.791667 | 1.2500000 | 3.562500 | 28.32431 | 8.409615 | 3.361165 | 7.070942 | 3.548486 | 4.997419 |
| TOTS | Testing | 9.933333 | 4.600000 | 8.000000 | 1.9333333 | 4.266667 | 30.13407 | 10.278040 | 4.239272 | 8.799351 | 4.333700 | 3.402039 |
## # A tibble: 502 x 24
## position Int TklW OG PKcon Age MP Min Gls Ast Non_PK_G PK
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 RB 39 45 0 0 26 27 2149 0 3 0 0
## 2 LM 11 26 0 0 23 38 2476 3 5 3 0
## 3 RB 51 31 0 0 22 27 2324 0 1 0 0
## 4 ST 14 16 0 0 26 36 2225 10 4 10 0
## 5 CB 41 36 0 0 32 32 2635 1 0 1 0
## 6 RB 72 74 0 3 29 27 2395 0 1 0 0
## 7 RB 67 31 0 1 26 26 2198 1 1 1 0
## 8 CB 27 26 0 1 25 34 3015 1 0 1 0
## 9 RB 74 63 0 0 23 30 2646 0 4 0 0
## 10 LB 24 23 0 3 22 27 2121 0 4 0 0
## # ... with 492 more rows, and 12 more variables: PKatt <dbl>, CrdY <dbl>,
## # CrdR <dbl>, G_plus_A_per90 <dbl>, G_minus_Pk_per90 <dbl>,
## # G_plus_A_minus_PK_per90 <dbl>, Rk <dbl>, GF <dbl>, GA <dbl>, GD <dbl>,
## # Pts <dbl>, revision <fct>
## == Workflow ====================================================================
## Preprocessor: Recipe
## Model: rand_forest()
##
## -- Preprocessor ----------------------------------------------------------------
## 3 Recipe Steps
##
## * step_rm()
## * step_upsample()
## * step_mutate_at()
##
## -- Model -----------------------------------------------------------------------
## Random Forest Model Specification (classification)
##
## Main Arguments:
## mtry = tune()
## trees = 100
## min_n = tune()
##
## Computational engine: ranger
We then examined the accuracy rates of the different models in the different folds. The second model in the first fold is the most accurate at 94.3% accuracy.
## ! Fold1: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold1: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold1: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## # A tibble: 18 x 8
## mtry min_n .metric .estimator mean n std_err .config
## <int> <int> <chr> <chr> <dbl> <int> <dbl> <chr>
## 1 1 2 accuracy binary 0.917 5 0.0216 Preprocessor1_Model1
## 2 1 2 roc_auc binary 0.941 5 0.0169 Preprocessor1_Model1
## 3 1 21 accuracy binary 0.907 5 0.0141 Preprocessor1_Model2
## 4 1 21 roc_auc binary 0.940 5 0.0168 Preprocessor1_Model2
## 5 1 40 accuracy binary 0.902 5 0.0140 Preprocessor1_Model3
## 6 1 40 roc_auc binary 0.937 5 0.0186 Preprocessor1_Model3
## 7 16 2 accuracy binary 0.926 5 0.0181 Preprocessor1_Model4
## 8 16 2 roc_auc binary 0.942 5 0.0221 Preprocessor1_Model4
## 9 16 21 accuracy binary 0.921 5 0.0167 Preprocessor1_Model5
## 10 16 21 roc_auc binary 0.939 5 0.0227 Preprocessor1_Model5
## 11 16 40 accuracy binary 0.902 5 0.0150 Preprocessor1_Model6
## 12 16 40 roc_auc binary 0.929 5 0.0245 Preprocessor1_Model6
## 13 31 2 accuracy binary 0.919 5 0.0162 Preprocessor1_Model7
## 14 31 2 roc_auc binary 0.933 5 0.0250 Preprocessor1_Model7
## 15 31 21 accuracy binary 0.904 5 0.0180 Preprocessor1_Model8
## 16 31 21 roc_auc binary 0.926 5 0.0257 Preprocessor1_Model8
## 17 31 40 accuracy binary 0.897 5 0.0177 Preprocessor1_Model9
## 18 31 40 roc_auc binary 0.929 5 0.0244 Preprocessor1_Model9
## Preparation of a new explainer is initiated
## -> model label : rf
## -> data : 407 rows 31 cols
## -> target variable : 407 values
## -> predict function : yhat.workflow will be used ( [33m default [39m )
## -> predicted values : No value for predict function target column. ( [33m default [39m )
## -> model_info : package tidymodels , ver. 0.1.2 , task classification ( [33m default [39m )
## -> predicted values : numerical, min = 0 , mean = 0.1544963 , max = 1
## -> residual function : difference between y and yhat ( [33m default [39m )
## -> residuals : numerical, min = -0.87 , mean = -0.0365602 , max = 0.59
## [32m A new explainer has been created! [39m
In this model, the most important variables are minutes played, goal differential, and goals plus assists per 90 minutes. These three variables contribute to the card classification significantly more than the other variables.
After running the random forest model, our model accuracy comes out to about 86.56%. This is likely due to many players outperforming their card rank, as well as many teams outperforming their projections.
## # A tibble: 2 x 4
## .metric .estimator .estimate .config
## <chr> <chr> <dbl> <chr>
## 1 accuracy binary 0.881 Preprocessor1_Model1
## 2 roc_auc binary 0.917 Preprocessor1_Model1
Overall, this model predicted that 18 players met our criteria to be selected for team of the season, while also misclassifying 15 players.
## Truth
## Prediction Normal TOTS
## Normal 108 6
## TOTS 11 9
The misclassified players are shown below:
## Player revision position Int TklW OG PKcon Nation Squad
## 1 Lois Diony 17 17 TOTS ST 4 14 0 0 FRA Dijon
## 2 Blaise Matuidi 17 17 Normal CDM 40 42 0 0 FRA Paris S-G
## 3 Adrien Rabiot 17 17 Normal CM 38 46 0 0 FRA Paris S-G
## 4 Djibril Sidibe 17 17 Normal RB 47 52 0 1 FRA Monaco
## 5 Jemerson 17 17 Normal CB 54 51 0 0 BRA Monaco
## 6 Giovani Lo Celso 18 18 Normal CAM 20 59 0 0 ARG Paris S-G
## 7 Dimitri Payet 18 18 Normal LW 18 9 0 0 FRA Marseille
## 8 Alassane Plea 18 18 Normal ST 13 9 0 0 FRA Nice
## 9 Adil Rami 18 18 TOTS CB 33 20 1 0 FRA Marseille
## 10 Dani Alves 18 18 Normal RB 28 52 0 0 BRA Paris S-G
## 11 Radamel Falcao 18 18 TOTS ST 13 7 1 0 COL Monaco
## 12 Joao Moutinho 18 18 Normal CM 39 44 0 0 POR Monaco
## 13 Houssem Aouar 19 19 Normal CM 31 36 0 0 FRA Lyon
## 14 Kenny Lala 19 19 TOTS RB 29 43 0 1 FRA Strasbourg
## 15 Ferland Mendy 19 19 TOTS LB 25 30 0 1 FRA Lyon
## 16 Teji Savanier 19 19 TOTS CDM 44 63 0 0 FRA Nîmes
## 17 Zeki Celik 19 19 Normal RB 34 55 0 3 TUR Lille
## Age Born MP Min minutes_played_divided_by90 Gls Ast Non_PK_G PK PKatt CrdY
## 1 23 1992 35 2807 31.2 11 7 11 0 0 2
## 2 29 1987 34 2415 26.8 4 4 4 0 0 4
## 3 21 1995 27 1935 21.5 3 2 3 0 0 2
## 4 24 1992 29 2321 25.8 2 5 2 0 0 7
## 5 23 1992 34 3058 34.0 2 0 2 0 0 8
## 6 21 1996 33 1776 19.7 4 2 4 0 0 2
## 7 30 1987 31 2347 26.1 6 13 4 2 3 6
## 8 24 1993 35 3041 33.8 16 4 15 1 2 7
## 9 31 1985 33 2955 32.8 1 1 1 0 0 5
## 10 34 1983 25 2065 22.9 1 4 1 0 0 7
## 11 31 1986 26 2128 23.6 18 2 15 3 4 1
## 12 30 1986 33 2802 31.1 1 4 1 0 0 6
## 13 20 1998 37 3061 34.0 7 7 7 0 0 2
## 14 26 1991 34 3060 34.0 5 9 4 1 2 4
## 15 23 1995 30 2531 28.1 2 1 2 0 0 2
## 16 26 1991 32 2864 31.8 6 14 2 4 4 6
## 17 21 1997 34 2971 33.0 1 5 1 0 0 5
## CrdR G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90 G_plus_A_minus_PK_per90
## 1 1 0.35 0.22 0.58 0.35 0.58
## 2 0 0.15 0.15 0.30 0.15 0.30
## 3 0 0.14 0.09 0.23 0.14 0.23
## 4 0 0.08 0.19 0.27 0.08 0.27
## 5 2 0.06 0.00 0.06 0.06 0.06
## 6 0 0.20 0.10 0.30 0.20 0.30
## 7 0 0.23 0.50 0.73 0.15 0.65
## 8 0 0.47 0.12 0.59 0.44 0.56
## 9 0 0.03 0.03 0.06 0.03 0.06
## 10 1 0.04 0.17 0.22 0.04 0.22
## 11 0 0.76 0.08 0.85 0.63 0.72
## 12 0 0.03 0.13 0.16 0.03 0.16
## 13 0 0.21 0.21 0.41 0.21 0.41
## 14 0 0.15 0.26 0.41 0.12 0.38
## 15 0 0.07 0.04 0.11 0.07 0.11
## 16 1 0.19 0.44 0.63 0.06 0.50
## 17 1 0.03 0.15 0.18 0.03 0.18
## Rk GF GA GD Pts Attendance .pred_Normal .pred_TOTS .pred_class
## 1 16 46 58 -12 37 10126 0.710 0.290 Normal
## 2 2 83 27 56 87 45160 0.190 0.810 TOTS
## 3 2 83 27 56 87 45160 0.270 0.730 TOTS
## 4 1 107 31 76 95 9586 0.150 0.850 TOTS
## 5 1 107 31 76 95 9586 0.140 0.860 TOTS
## 6 1 108 29 79 93 46929 0.370 0.630 TOTS
## 7 4 80 47 33 77 46040 0.480 0.520 TOTS
## 8 8 53 52 1 54 22876 0.160 0.840 TOTS
## 9 4 80 47 33 77 46040 0.780 0.220 Normal
## 10 1 108 29 79 93 46929 0.160 0.840 TOTS
## 11 2 85 45 40 80 9243 0.785 0.215 Normal
## 12 2 85 45 40 80 9243 0.100 0.900 TOTS
## 13 3 70 47 23 72 49079 0.250 0.750 TOTS
## 14 11 58 48 10 49 25216 0.910 0.090 Normal
## 15 3 70 47 23 72 49079 0.740 0.260 Normal
## 16 9 57 58 -1 53 13994 0.600 0.400 Normal
## 17 2 68 33 35 75 34079 0.420 0.580 TOTS
## Warning: Novel levels found in column 'Nation': 'AUT', 'CAN', 'CHI', 'CRC',
## 'ECU', 'PER', 'SCO', 'ZIM'. The levels have been removed, and values have been
## coerced to 'NA'.
## Warning: Novel levels found in column 'Nation': 'AUT', 'CAN', 'CHI', 'CRC',
## 'ECU', 'PER', 'SCO', 'ZIM'. The levels have been removed, and values have been
## coerced to 'NA'.
<<<<<<< HEAD
======= The model also shows that Jonathan Bamba, Idrissa Gana Gueye, Aurelien Tchouameni, Maxence Cqueret, and Ander Herrera are the top 5 midfielders in Ligue 1. >>>>>>> 54fd9562119ff2b0a713a0196046ce9da4cce7ab
| Player | Position | Squad | Minutes Played | Min | Goals | Assists | Team Rank | Points | Predicted TOTS Probability | Projected Role |
|---|---|---|---|---|---|---|---|---|---|---|
| Gaetan Laborde | ST | Montpellier | 34 | 2932 | 13 | 8 | 8 | 47 | 0.890 | Starter |
| Memphis Depay | CF | Lyon | 34 | 2653 | 18 | 9 | 4 | 67 | 0.840 | Starter |
| Kylian Mbappe | ST | Paris S-G | 29 | 2214 | 25 | 7 | 2 | 72 | 0.720 | Starter |
| Jonathan Bamba | LM | Lille | 34 | 2719 | 6 | 9 | 1 | 73 | 0.740 | Starter |
| Aurelien Tchouameni | CM | Monaco | 32 | 2703 | 2 | 4 | 3 | 71 | 0.580 | Starter |
| Ander Herrera | CM | Paris S-G | 27 | 1571 | 1 | 3 | 2 | 72 | 0.430 | Starter |
| Thomas Delaine | LB | Metz | 22 | 1600 | 3 | 1 | 10 | 43 | 0.420 | Starter |
| Leo Dubois | RB | Lyon | 33 | 2610 | 2 | 3 | 4 | 67 | 0.390 | Starter |
| Sven Botman | CB | Lille | 33 | 2949 | 0 | 0 | 1 | 73 | 0.385 | Starter |
| Damien Da Silva | CB | Rennes | 27 | 2397 | 4 | 0 | 7 | 54 | 0.380 | Starter |
| Kevin Volland | ST | Monaco | 31 | 2419 | 15 | 7 | 3 | 71 | 0.710 | Bench |
| Wissam Ben Yedder | ST | Monaco | 33 | 2266 | 18 | 5 | 3 | 71 | 0.650 | Bench |
| Benjamin Andre | CDM | Lille | 32 | 2643 | 0 | 1 | 1 | 73 | 0.420 | Bench |
| Gael Kakuta | CAM | Lens | 31 | 2217 | 11 | 5 | 5 | 56 | 0.420 | Bench |
| Leonardo Balerdi | CB | Marseille | 17 | 1363 | 2 | 0 | 6 | 55 | 0.320 | Bench |
Ligue 1 Team of the Season
Considered the league of the people due to its rule of forcing every club to be 51% fan owned, the German Bundesliga is considered the second best defensive league behind the Premier League. Bayern Munich have dominated the league for many years, often poaching the best players from other teams in the league.
First, we looked at how many TOTS players there are vs normal players in our bundesliga data set. As you can see TOTS is a prestigious award not given to many players.
Then we looked at a density of goals scored. As we can see the top for both TOTS and not is fairly low, but the TOTS tend to score more.
Next, we have a density plot of the table position of TOTS vs normal players, and as with the other leagues the TOTS players tend to do better.
Next, we have a density plot of the minutes played of the normal cards vs TOTS cards and it is clear that the team of the season players play much more.
Lastly, we have a distribution of the positions and what positions got team of the seasons. As you can see there are not many wingers in the bundesliga, so they have been converted to left and right mids.
In the table below we have some comparisons of the important stats for the training and testing data sets. As you can see the difference between the two is minimal.
## `summarise()` has grouped output by 'Revision'. You can override using the `.groups` argument.
## `summarise()` has grouped output by 'Revision'. You can override using the `.groups` argument.
| Revision | Type | Goals | Assists | Non PK Goals | PK | Team Rank | Minutes Per 90 | Goals SD | Assists SD | Non PK Goals SD | Team Rank SD | Minutes Per 90 SD |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Normal | Training | 2.753571 | 2.142857 | 2.528571 | 0.2250000 | 9.821429 | 25.32901 | 3.279127 | 2.049815 | 2.971653 | 4.919084 | 3.963559 |
| Normal | Testing | 2.849462 | 2.064516 | 2.634409 | 0.2150538 | 10.075269 | 24.80585 | 3.653308 | 1.904265 | 3.209310 | 4.759970 | 3.341268 |
| TOTS | Training | 8.687500 | 5.437500 | 7.854167 | 0.8333333 | 3.750000 | 27.58588 | 7.754631 | 3.902570 | 6.866476 | 2.935476 | 3.970949 |
| TOTS | Testing | 9.625000 | 6.625000 | 8.062500 | 1.5625000 | 7.062500 | 26.82292 | 7.022583 | 3.827532 | 5.960635 | 4.106397 | 4.332935 |
## Preparation of a new explainer is initiated
## -> model label : rf
## -> data : 328 rows 31 cols
## -> target variable : 328 values
## -> predict function : yhat.workflow will be used ( [33m default [39m )
## -> predicted values : No value for predict function target column. ( [33m default [39m )
## -> model_info : package tidymodels , ver. 0.1.2 , task classification ( [33m default [39m )
## -> predicted values : numerical, min = 0.02071847 , mean = 0.2438431 , max = 0.9038921
## -> residual function : difference between y and yhat ( [33m default [39m )
## -> residuals : numerical, min = -0.8032962 , mean = -0.09750159 , max = 0.7312163
## [32m A new explainer has been created! [39m
Below is the variable importance plot for the bundesliga model. The most important variables are “Minutes Played”, “Non Penalty Goals”, and “Assists”.
This shows us how well we did in predicting on the testing data. We predicted 7 TOTS correctly, 85 normal correctly, and 17 total incorrectly.
## Warning: Novel levels found in column 'Nation': 'ANG', 'ARM', 'BEN', 'BFA',
## 'BUL', 'CAN', 'ECU', 'FRO', 'MKD', 'WAL'. The levels have been removed, and
## values have been coerced to 'NA'.
## Warning: Novel levels found in column 'Nation': 'ANG', 'ARM', 'BEN', 'BFA',
## 'BUL', 'CAN', 'ECU', 'FRO', 'MKD', 'WAL'. The levels have been removed, and
## values have been coerced to 'NA'.
| Player | Position | Squad | Minutes Played | Min | Goals | Assists | Team Rank | Points | Predicted TOTS Probability | Projected Role |
|---|---|---|---|---|---|---|---|---|---|---|
| Wout Weghorst | ST | Wolfsburg | 31 | 2671 | 20 | 7 | 3 | 57 | 0.8402007 | Starter |
| Robert Lewandowski | ST | Bayern Munich | 26 | 2188 | 36 | 6 | 1 | 71 | 0.8085189 | Starter |
| Erling Haaland | ST | Dortmund | 26 | 2227 | 25 | 5 | 5 | 55 | 0.7750907 | Starter |
| Thomas Muller | CAM | Bayern Munich | 29 | 2453 | 10 | 17 | 1 | 71 | 0.8045416 | Starter |
| Leroy Sane | LM | Bayern Munich | 29 | 1672 | 4 | 9 | 1 | 71 | 0.7187507 | Starter |
| Joshua Kimmich | CDM | Bayern Munich | 24 | 1924 | 3 | 10 | 1 | 71 | 0.6848506 | Starter |
| David Alaba | CB | Bayern Munich | 29 | 2454 | 2 | 2 | 1 | 71 | 0.5641070 | Starter |
| Jerome Boateng | CB | Bayern Munich | 26 | 2148 | 1 | 1 | 1 | 71 | 0.5273112 | Starter |
| Ridle Baku | RB | Wolfsburg | 29 | 2409 | 6 | 4 | 3 | 57 | 0.4589220 | Starter |
| Angelino | LB | RB Leipzig | 24 | 2042 | 4 | 4 | 2 | 64 | 0.4514446 | Starter |
| Andre Silva | ST | Eint Frankfurt | 29 | 2490 | 25 | 6 | 4 | 56 | 0.7665250 | Bench |
| Andrej Kramaric | ST | Hoffenheim | 25 | 2109 | 17 | 3 | 11 | 36 | 0.5549401 | Bench |
| Dani Olmo | CAM | RB Leipzig | 31 | 2104 | 4 | 9 | 2 | 64 | 0.6773476 | Bench |
| Leon Goretzka | CM | Bayern Munich | 23 | 1695 | 5 | 5 | 1 | 71 | 0.6361432 | Bench |
| Willi Orban | CB | RB Leipzig | 26 | 2093 | 4 | 1 | 2 | 64 | 0.4441475 | Bench |
Bundesliga Team of the Season
The Serie A has one of the richest histories in Europe, with the likes of AC Milan, Inter Milan, and Juventus all having great success. However, in recent history the league has been completely dominated by Juventus with them winning 9 titles in a row before being stopped this year by Inter.
## Warning: Removed 19578 rows containing non-finite values (stat_bin).
First we made a bar chart to see the number of team of the season players in the Serie A.
Next we made a density plot of goals. Team of the season players tend to score slightly more goals than normal players.
Then we made a density plot of team rank of the team of the season players vs normal players. We can see that the team of the season players finish much higher in the table.
Next we made a distribution plot of how much the team of the season players play vs normal players. As you can see the team of the seaon players tend to play a lot more.
We then made a plot of the positional breakdown of all the players. It seems that the distribution of the players is heavily in center backs, center mids, and strikers.
Next we made a table to compare important stats for the training and testing data.
## `summarise()` has grouped output by 'Revision'. You can override using the `.groups` argument.
## `summarise()` has grouped output by 'Revision'. You can override using the `.groups` argument.
| Revision | Type | Goals | Assists | Non PK Goals | PK | Team Rank | Minutes Per 90 | Goals SD | Assists SD | Non PK Goals SD | Team Rank SD | Minutes Per 90 SD |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Normal | Training | 3.172702 | 2.350975 | 2.883008 | 0.2896936 | 10.777159 | 27.01501 | 3.655291 | 2.215850 | 3.263599 | 5.538129 | 4.844356 |
| Normal | Testing | 2.873950 | 2.134454 | 2.563025 | 0.3109244 | 10.647059 | 26.81410 | 3.472787 | 2.306675 | 3.219745 | 5.453328 | 4.813499 |
| TOTS | Training | 10.339623 | 4.849057 | 9.301887 | 1.0377358 | 4.301887 | 29.78973 | 8.864240 | 3.307313 | 7.655026 | 3.220039 | 5.059081 |
| TOTS | Testing | 9.411765 | 5.176471 | 8.000000 | 1.4117647 | 3.882353 | 28.83987 | 7.080420 | 4.333522 | 5.623611 | 3.407388 | 5.063365 |
## # A tibble: 502 x 24
## position Int TklW OG PKcon Age MP Min Gls Ast Non_PK_G PK
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CB 51 32 0 2 27 37 3258 2 0 2 0
## 2 RW 33 27 0 0 26 35 2516 12 8 10 2
## 3 LB 15 24 0 0 31 24 1807 3 2 3 0
## 4 CM 40 38 0 0 32 34 2741 0 0 0 0
## 5 CDM 28 23 0 0 21 25 1803 0 1 0 0
## 6 CB 18 20 0 1 22 32 2880 0 1 0 0
## 7 CB 25 26 0 1 31 24 1927 0 0 0 0
## 8 ST 9 15 0 2 28 33 2719 11 0 11 0
## 9 LB 15 24 0 0 31 24 1807 3 2 3 0
## 10 CM 37 52 0 0 32 28 1984 0 4 0 0
## # ... with 492 more rows, and 12 more variables: PKatt <dbl>, CrdY <dbl>,
## # CrdR <dbl>, G_plus_A_per90 <dbl>, G_minus_Pk_per90 <dbl>,
## # G_plus_A_minus_PK_per90 <dbl>, Rk <dbl>, GF <dbl>, GA <dbl>, GD <dbl>,
## # Pts <dbl>, revision <fct>
## == Workflow ====================================================================
## Preprocessor: Recipe
## Model: rand_forest()
##
## -- Preprocessor ----------------------------------------------------------------
## 3 Recipe Steps
##
## * step_rm()
## * step_upsample()
## * step_mutate_at()
##
## -- Model -----------------------------------------------------------------------
## Random Forest Model Specification (classification)
##
## Main Arguments:
## mtry = tune()
## trees = 100
## min_n = tune()
##
## Computational engine: ranger
## ! Fold1: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold1: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold1: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## # A tibble: 18 x 8
## mtry min_n .metric .estimator mean n std_err .config
## <int> <int> <chr> <chr> <dbl> <int> <dbl> <chr>
## 1 1 2 accuracy binary 0.888 5 0.0153 Preprocessor1_Model1
## 2 1 2 roc_auc binary 0.904 5 0.0136 Preprocessor1_Model1
## 3 1 21 accuracy binary 0.896 5 0.0144 Preprocessor1_Model2
## 4 1 21 roc_auc binary 0.888 5 0.0171 Preprocessor1_Model2
## 5 1 40 accuracy binary 0.896 5 0.0112 Preprocessor1_Model3
## 6 1 40 roc_auc binary 0.890 5 0.0209 Preprocessor1_Model3
## 7 16 2 accuracy binary 0.872 5 0.0167 Preprocessor1_Model4
## 8 16 2 roc_auc binary 0.873 5 0.0185 Preprocessor1_Model4
## 9 16 21 accuracy binary 0.867 5 0.00506 Preprocessor1_Model5
## 10 16 21 roc_auc binary 0.876 5 0.0202 Preprocessor1_Model5
## 11 16 40 accuracy binary 0.864 5 0.0133 Preprocessor1_Model6
## 12 16 40 roc_auc binary 0.874 5 0.0232 Preprocessor1_Model6
## 13 31 2 accuracy binary 0.869 5 0.0147 Preprocessor1_Model7
## 14 31 2 roc_auc binary 0.874 5 0.0234 Preprocessor1_Model7
## 15 31 21 accuracy binary 0.852 5 0.0126 Preprocessor1_Model8
## 16 31 21 roc_auc binary 0.868 5 0.0218 Preprocessor1_Model8
## 17 31 40 accuracy binary 0.847 5 0.00889 Preprocessor1_Model9
## 18 31 40 roc_auc binary 0.866 5 0.0251 Preprocessor1_Model9
## Preparation of a new explainer is initiated
## -> model label : rf
## -> data : 412 rows 31 cols
## -> target variable : 412 values
## -> predict function : yhat.workflow will be used ( [33m default [39m )
## -> predicted values : No value for predict function target column. ( [33m default [39m )
## -> model_info : package tidymodels , ver. 0.1.2 , task classification ( [33m default [39m )
## -> predicted values : numerical, min = 0.004085272 , mean = 0.2019935 , max = 0.9309736
## -> residual function : difference between y and yhat ( [33m default [39m )
## -> residuals : numerical, min = -0.6310856 , mean = -0.07335276 , max = 0.7620644
## [32m A new explainer has been created! [39m
Here is a plot of the most important variables in our Serie A model. It seems that “Minutes Played”, “Tackles Won”, and “Assists” seem to be the most important.
## # A tibble: 2 x 4
## .metric .estimator .estimate .config
## <chr> <chr> <dbl> <chr>
## 1 accuracy binary 0.919 Preprocessor1_Model1
## 2 roc_auc binary 0.933 Preprocessor1_Model1
## Truth
## Prediction Normal TOTS
## Normal 116 8
## TOTS 3 9
Here is a confusion matrix of the predictions and true values for the testing data. As you can see we predicted 9 team of the season players correctly and 10 incorrectly. While this is not great, it seems to be mostly ok because the predicted probabilities are seem to be ordered fairly well.
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.
## Truth
## Prediction Normal TOTS
## Normal 117 8
## TOTS 2 9
Here are the players in the testing data that our model predicted wrong. As you can see it is a wide variety of players, some being predicted wrong likely due to position, others due to team performance and others due to personal performance.
## Player revision position Int TklW OG PKcon Nation Squad
## 1 Mattia Caldara 17 TOTS CB 90 36 0 0 ITA Atalanta
## 2 Giorgio Chiellini 18 TOTS CB 28 15 0 0 ITA Juventus
## 3 Federico Chiesa 18 TOTS RW 9 37 0 0 ITA Fiorentina
## 4 Edin Dzeko 18 Normal ST 2 10 0 0 BIH Roma
## 5 Fabio Quagliarella 18 TOTS ST 8 9 0 0 ITA Sampdoria
## 6 Emre Can 19 TOTS CM 21 58 1 1 GER Juventus
## 7 Giorgio Chiellini 19 TOTS CB 23 9 0 0 ITA Juventus
## 8 Rodrigo De Paul 19 TOTS CM 36 31 0 1 ARG Udinese
## 9 Mario Mandzukic 19 Normal ST 12 22 0 0 CRO Juventus
## 10 Allan 19 TOTS CM 16 92 0 0 BRA Napoli
## Age Born MP Min minutes_played_divided_by90 Gls Ast Non_PK_G PK PKatt CrdY
## 1 22 1994 30 2655 29.5 7 0 7 0 0 4
## 2 32 1984 26 2161 24.0 0 1 0 0 0 2
## 3 19 1997 36 3012 33.5 6 4 6 0 0 7
## 4 31 1986 36 3018 33.5 16 3 16 0 0 6
## 5 34 1983 35 2719 30.2 19 5 12 7 8 4
## 6 24 1994 29 1811 20.1 4 1 3 1 1 7
## 7 33 1984 25 1991 22.1 1 1 1 0 0 3
## 8 24 1994 36 3189 35.4 9 9 6 3 6 7
## 9 32 1986 25 2014 22.4 9 6 9 0 0 4
## 10 27 1991 33 2616 29.1 1 3 1 0 0 10
## CrdR G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90 G_plus_A_minus_PK_per90
## 1 0 0.24 0.00 0.24 0.24 0.24
## 2 0 0.00 0.04 0.04 0.00 0.04
## 3 0 0.18 0.12 0.30 0.18 0.30
## 4 0 0.48 0.09 0.57 0.48 0.57
## 5 0 0.63 0.17 0.79 0.40 0.56
## 6 0 0.20 0.05 0.25 0.15 0.20
## 7 0 0.05 0.05 0.09 0.05 0.09
## 8 0 0.25 0.25 0.51 0.17 0.42
## 9 0 0.40 0.27 0.67 0.40 0.67
## 10 0 0.03 0.10 0.14 0.03 0.14
## Rk GF GA GD Pts Attendance .pred_Normal .pred_TOTS .pred_class
## 1 4 62 41 21 72 16948 0.6251494 0.3748506 Normal
## 2 1 86 24 62 95 39316 0.7727314 0.2272686 Normal
## 3 8 54 46 8 57 26092 0.6828040 0.3171960 Normal
## 4 3 61 28 33 77 37450 0.4780883 0.5219117 TOTS
## 5 10 56 60 -4 54 20156 0.6140780 0.3859220 Normal
## 6 1 70 30 40 90 37799 0.5843191 0.4156809 Normal
## 7 1 70 30 40 90 37799 0.7720459 0.2279541 Normal
## 8 12 39 53 -14 43 20414 0.7541297 0.2458703 Normal
## 9 1 70 30 40 90 37799 0.4838431 0.5161569 TOTS
## 10 2 74 36 38 79 29003 0.7326498 0.2673502 Normal
## Warning: Novel levels found in column 'Nation': 'ARM', 'EQG', 'RUS', 'UKR',
## 'USA', 'WAL'. The levels have been removed, and values have been coerced to
## 'NA'.
## Warning: Novel levels found in column 'Nation': 'ARM', 'EQG', 'RUS', 'UKR',
## 'USA', 'WAL'. The levels have been removed, and values have been coerced to
## 'NA'.
Here are the predicted team of the season players for the Serie A this year:
| Player | Position | Squad | Minutes Played | Min | Goals | Assists | Team Rank | Points | Predicted TOTS Probability | Projected Role |
|---|---|---|---|---|---|---|---|---|---|---|
| Romelu Lukaku | ST | Inter | 32 | 2580 | 21 | 9 | 1 | 79 | 0.8076085 | Starter |
| Cristiano Ronaldo | ST | Juventus | 29 | 2463 | 25 | 2 | 3 | 66 | 0.7640680 | Starter |
| Lautaro Martinez | ST | Inter | 33 | 2238 | 15 | 5 | 1 | 79 | 0.6532790 | Starter |
| Ruslan Malinovskyi | CM | Atalanta | 31 | 1525 | 6 | 9 | 2 | 68 | 0.5491604 | Starter |
| Matteo Politano | RM | Napoli | 32 | 1696 | 9 | 4 | 4 | 66 | 0.5358178 | Starter |
| Piotr Zielinski | CM | Napoli | 31 | 2154 | 6 | 8 | 4 | 66 | 0.5233437 | Starter |
| Cristian Romero | CB | Atalanta | 26 | 2095 | 2 | 2 | 2 | 68 | 0.4763224 | Starter |
| Milan Skriniar | CB | Inter | 29 | 2507 | 3 | 0 | 1 | 79 | 0.3282628 | Starter |
| Juan Cuadrado | RB | Juventus | 25 | 1812 | 0 | 10 | 3 | 66 | 0.3059387 | Starter |
| Rafael Toloi | CB | Atalanta | 28 | 2283 | 2 | 0 | 2 | 68 | 0.2961210 | Starter |
| Duvan Zapata | ST | Atalanta | 32 | 2052 | 14 | 7 | 2 | 68 | 0.5800192 | Bench |
| Ciro Immobile | ST | Lazio | 30 | 2399 | 18 | 5 | 6 | 61 | 0.5494533 | Bench |
| Achraf Hakimi | RM | Inter | 32 | 2307 | 6 | 6 | 1 | 79 | 0.4933506 | Bench |
| Nicolo Barella | CM | Inter | 32 | 2596 | 3 | 5 | 1 | 79 | 0.4745268 | Bench |
| Jose Luis Palomino | CB | Atalanta | 31 | 2217 | 1 | 2 | 2 | 68 | 0.2645608 | Bench |
Serie A Team of the Season
Here we show how Kevin De Bruyne would be modeled in all the different leagues had he played in them in order to demonstrate the similarities and differences between the models.
In all of the leagues he preforms fairly well, but we can see that some of the models have assists as a more important stat thus making him do better. And some of the leagues place more negative weight on the fact that he has played slightly less this season, etc.
In conclusion, we found that this is something that is very hard to predict. Our models in no way predicting the binary of TOTS or not properly, but they did seem to order the predicted probabilities fairly well. The best stats that our models seemed to use was how well the player’s team is doing and how much the player is playing. Obviously they used other stats fairly effectively as well, but they struggled to predict players that played well on worse teams. Thus these models likely couldn’t be used for much other than proving that much of what EA Sports does is subjective in terms of picking who gets these cards. Making these models confirmed our suspicion that they have no method to their madness. One interesting implication of this could be how getting or not getting one of these cards affects the public’s perception of the player. Are there players that should be more highly rated by soccer fans, but they didn’t get a team of the season so they aren’t (and vice versa).